Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 4829 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 160.5 KiB |
| Average record size in memory | 34.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 1 |
msno has a high cardinality: 4760 distinct values | High cardinality |
num_25 is highly correlated with num_unq | High correlation |
num_50 is highly correlated with num_75 | High correlation |
num_75 is highly correlated with num_50 | High correlation |
num_100 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_25 and 2 other fields | High correlation |
total_secs is highly correlated with num_100 and 1 other fields | High correlation |
num_25 is highly correlated with num_50 and 1 other fields | High correlation |
num_50 is highly correlated with num_25 | High correlation |
num_100 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_25 and 2 other fields | High correlation |
total_secs is highly correlated with num_100 and 1 other fields | High correlation |
num_100 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_100 and 1 other fields | High correlation |
total_secs is highly correlated with num_100 and 1 other fields | High correlation |
num_100 is highly correlated with num_unq | High correlation |
num_25 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_100 and 3 other fields | High correlation |
num_75 is highly correlated with num_unq and 1 other fields | High correlation |
num_50 is highly correlated with num_25 and 2 other fields | High correlation |
msno is uniformly distributed | Uniform |
df_index has unique values | Unique |
num_25 has 1215 (25.2%) zeros | Zeros |
num_50 has 2283 (47.3%) zeros | Zeros |
num_75 has 2595 (53.7%) zeros | Zeros |
num_985 has 2537 (52.5%) zeros | Zeros |
num_100 has 167 (3.5%) zeros | Zeros |
Reproduction
| Analysis started | 2023-05-18 18:15:46.395372 |
|---|---|
| Analysis finished | 2023-05-18 18:15:57.186335 |
| Duration | 10.79 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 4829 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2391687.621 |
| Minimum | 307 |
|---|---|
| Maximum | 4826962 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.9 KiB |
Quantile statistics
| Minimum | 307 |
|---|---|
| 5-th percentile | 241996 |
| Q1 | 1187990 |
| median | 2389166 |
| Q3 | 3577630 |
| 95-th percentile | 4573631.2 |
| Maximum | 4826962 |
| Range | 4826655 |
| Interquartile range (IQR) | 2389640 |
Descriptive statistics
| Standard deviation | 1388278.19 |
|---|---|
| Coefficient of variation (CV) | 0.5804596629 |
| Kurtosis | -1.201134541 |
| Mean | 2391687.621 |
| Median Absolute Deviation (MAD) | 1195287 |
| Skewness | 0.01530554185 |
| Sum | 1.154945952 × 1010 |
| Variance | 1.927316333 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2415780 | 1 | < 0.1% |
| 4264548 | 1 | < 0.1% |
| 3116461 | 1 | < 0.1% |
| 186133 | 1 | < 0.1% |
| 1194920 | 1 | < 0.1% |
| 3922786 | 1 | < 0.1% |
| 3495957 | 1 | < 0.1% |
| 3393630 | 1 | < 0.1% |
| 2007454 | 1 | < 0.1% |
| 3598039 | 1 | < 0.1% |
| Other values (4819) | 4819 |
| Value | Count | Frequency (%) |
| 307 | 1 | |
| 394 | 1 | |
| 874 | 1 | |
| 1108 | 1 | |
| 2279 | 1 | |
| 2309 | 1 | |
| 2481 | 1 | |
| 2791 | 1 | |
| 3626 | 1 | |
| 4305 | 1 |
| Value | Count | Frequency (%) |
| 4826962 | 1 | |
| 4826325 | 1 | |
| 4825797 | 1 | |
| 4825655 | 1 | |
| 4825411 | 1 | |
| 4825017 | 1 | |
| 4824660 | 1 | |
| 4823456 | 1 | |
| 4823215 | 1 | |
| 4820849 | 1 |
| Distinct | 4760 |
|---|---|
| Distinct (%) | 98.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.9 KiB |
| qa7Tqz37oD5iGZHBDazEyZOSxanjpfoJwU6sM4tlM3c= | 2 |
|---|---|
| elxZ+SjElCV79jtd1tiZGJVRc+FcqtfU2NnxGhmrgn0= | 2 |
| mpQI3OVbrSTDBE5+7oQ/uGmjgJfalefAkkfpjIjqTqw= | 2 |
| B3/t5W3l4DGgqZvSDNNqj74PEkcSMUdeD0HzARUXVF8= | 2 |
| Uqm5yx0ZfBNVbpHehhHZt2tQ9zDuwmpxnVlbar/YkEk= | 2 |
| Other values (4755) |
Length
| Max length | 44 |
|---|---|
| Median length | 44 |
| Mean length | 44 |
| Min length | 44 |
Characters and Unicode
| Total characters | 212476 |
|---|---|
| Distinct characters | 65 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 4691 ? |
|---|---|
| Unique (%) | 97.1% |
Sample
| 1st row | tej+Qj4xcYOl/Eo/m2s7tOajEsQaaKdOgkInffLDF1E= |
|---|---|
| 2nd row | pYFqXAy0HELNV4ZhfcrzOxeb81Oh8dLx+5hRaA9rV14= |
| 3rd row | fi4aWbGcpaowRmm9yBoZUkWoYveFvVu2TCU+3bWWvtk= |
| 4th row | sQ4z126F7JuRtOuDKhBBhKcaEw9jSrU5Tkal/uZhtfA= |
| 5th row | XCbodeB083/UlSDlanPQbWQDdNNLUdyOjbIJz1KW5/Q= |
Common Values
| Value | Count | Frequency (%) |
| qa7Tqz37oD5iGZHBDazEyZOSxanjpfoJwU6sM4tlM3c= | 2 | < 0.1% |
| elxZ+SjElCV79jtd1tiZGJVRc+FcqtfU2NnxGhmrgn0= | 2 | < 0.1% |
| mpQI3OVbrSTDBE5+7oQ/uGmjgJfalefAkkfpjIjqTqw= | 2 | < 0.1% |
| B3/t5W3l4DGgqZvSDNNqj74PEkcSMUdeD0HzARUXVF8= | 2 | < 0.1% |
| Uqm5yx0ZfBNVbpHehhHZt2tQ9zDuwmpxnVlbar/YkEk= | 2 | < 0.1% |
| sVW70i8voMj/mCcxh3hhKWEMv2BvXVbkMub3WR0lGFE= | 2 | < 0.1% |
| q+nzp/y18gZ1yEIoT5bjYYJZvqtKsF9NhQHqIePfuxw= | 2 | < 0.1% |
| gjvctqyEfdNQ0yT9B9Glqk03VgFKqW8HrxUWl37kr/o= | 2 | < 0.1% |
| SmO89dWA5GCbSMzD4NwjJAHjTgcpwD9JwogNPSiElu8= | 2 | < 0.1% |
| ffyA5RppImQqtkKOzihR5L7KOnaCzB85i9TGdBehAAo= | 2 | < 0.1% |
| Other values (4750) | 4809 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| qa7tqz37od5igzhbdazeyzosxanjpfojwu6sm4tlm3c | 2 | < 0.1% |
| elxz+sjelcv79jtd1tizgjvrc+fcqtfu2nnxghmrgn0 | 2 | < 0.1% |
| mpqi3ovbrstdbe5+7oq/ugmjgjfalefakkfpjijqtqw | 2 | < 0.1% |
| b3/t5w3l4dggqzvsdnnqj74pekcsmuded0hzaruxvf8 | 2 | < 0.1% |
| uqm5yx0zfbnvbphehhhzt2tq9zduwmpxnvlbar/ykek | 2 | < 0.1% |
| svw70i8vomj/mccxh3hhkwemv2bvxvbkmub3wr0lgfe | 2 | < 0.1% |
| q+nzp/y18gz1yeiot5bjyyjzvqtksf9nhqhqiepfuxw | 2 | < 0.1% |
| gjvctqyefdnq0yt9b9glqk03vgfkqw8hrxuwl37kr/o | 2 | < 0.1% |
| smo89dwa5gcbsmzd4nwjjahjtgcpwd9jwognpsielu8 | 2 | < 0.1% |
| ffya5rppimqqtkkozihr5l7konaczb85i9tgdbehaao | 2 | < 0.1% |
| Other values (4750) | 4809 |
Most occurring characters
| Value | Count | Frequency (%) |
| = | 4829 | 2.3% |
| 0 | 3569 | 1.7% |
| Q | 3528 | 1.7% |
| U | 3514 | 1.7% |
| I | 3506 | 1.7% |
| g | 3500 | 1.6% |
| w | 3488 | 1.6% |
| o | 3487 | 1.6% |
| A | 3479 | 1.6% |
| E | 3469 | 1.6% |
| Other values (55) | 176107 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 84609 | |
| Lowercase Letter | 84056 | |
| Decimal Number | 32707 | 15.4% |
| Math Symbol | 7987 | 3.8% |
| Other Punctuation | 3117 | 1.5% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| g | 3500 | 4.2% |
| w | 3488 | 4.1% |
| o | 3487 | 4.1% |
| k | 3450 | 4.1% |
| c | 3414 | 4.1% |
| s | 3361 | 4.0% |
| h | 3239 | 3.9% |
| z | 3235 | 3.8% |
| y | 3233 | 3.8% |
| t | 3220 | 3.8% |
| Other values (16) | 50429 |
Uppercase Letter
| Value | Count | Frequency (%) |
| Q | 3528 | 4.2% |
| U | 3514 | 4.2% |
| I | 3506 | 4.1% |
| A | 3479 | 4.1% |
| E | 3469 | 4.1% |
| Y | 3463 | 4.1% |
| M | 3443 | 4.1% |
| B | 3249 | 3.8% |
| V | 3242 | 3.8% |
| R | 3230 | 3.8% |
| Other values (16) | 50486 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 3569 | |
| 4 | 3414 | |
| 8 | 3384 | |
| 1 | 3256 | |
| 2 | 3211 | |
| 5 | 3185 | |
| 7 | 3182 | |
| 3 | 3176 | |
| 6 | 3176 | |
| 9 | 3154 |
Math Symbol
| Value | Count | Frequency (%) |
| = | 4829 | |
| + | 3158 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 3117 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 168665 | |
| Common | 43811 | 20.6% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| Q | 3528 | 2.1% |
| U | 3514 | 2.1% |
| I | 3506 | 2.1% |
| g | 3500 | 2.1% |
| w | 3488 | 2.1% |
| o | 3487 | 2.1% |
| A | 3479 | 2.1% |
| E | 3469 | 2.1% |
| Y | 3463 | 2.1% |
| k | 3450 | 2.0% |
| Other values (42) | 133781 |
Common
| Value | Count | Frequency (%) |
| = | 4829 | |
| 0 | 3569 | 8.1% |
| 4 | 3414 | 7.8% |
| 8 | 3384 | 7.7% |
| 1 | 3256 | 7.4% |
| 2 | 3211 | 7.3% |
| 5 | 3185 | 7.3% |
| 7 | 3182 | 7.3% |
| 3 | 3176 | 7.2% |
| 6 | 3176 | 7.2% |
| Other values (3) | 9429 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 212476 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| = | 4829 | 2.3% |
| 0 | 3569 | 1.7% |
| Q | 3528 | 1.7% |
| U | 3514 | 1.7% |
| I | 3506 | 1.7% |
| g | 3500 | 1.6% |
| w | 3488 | 1.6% |
| o | 3487 | 1.6% |
| A | 3479 | 1.6% |
| E | 3469 | 1.6% |
| Other values (55) | 176107 |
date
Real number (ℝ≥0)
| Distinct | 678 |
|---|---|
| Distinct (%) | 14.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20160880.03 |
| Minimum | 20150116 |
|---|---|
| Maximum | 20170228 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 19.0 KiB |
Quantile statistics
| Minimum | 20150116 |
|---|---|
| 5-th percentile | 20150830 |
| Q1 | 20160321 |
| median | 20160810 |
| Q3 | 20161127 |
| 95-th percentile | 20170211 |
| Maximum | 20170228 |
| Range | 20112 |
| Interquartile range (IQR) | 806 |
Descriptive statistics
| Standard deviation | 5349.685205 |
|---|---|
| Coefficient of variation (CV) | 0.0002653497862 |
| Kurtosis | 0.2369621034 |
| Mean | 20160880.03 |
| Median Absolute Deviation (MAD) | 400 |
| Skewness | -0.07126137547 |
| Sum | 9.735688966 × 1010 |
| Variance | 28619131.79 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 20170111 | 21 | 0.4% |
| 20161212 | 19 | 0.4% |
| 20161031 | 19 | 0.4% |
| 20160826 | 19 | 0.4% |
| 20170216 | 19 | 0.4% |
| 20170203 | 18 | 0.4% |
| 20161222 | 18 | 0.4% |
| 20170103 | 18 | 0.4% |
| 20170222 | 18 | 0.4% |
| 20170125 | 18 | 0.4% |
| Other values (668) | 4642 |
| Value | Count | Frequency (%) |
| 20150116 | 1 | |
| 20150126 | 1 | |
| 20150208 | 1 | |
| 20150210 | 1 | |
| 20150213 | 1 | |
| 20150215 | 1 | |
| 20150218 | 1 | |
| 20150225 | 1 | |
| 20150228 | 2 | |
| 20150302 | 1 |
| Value | Count | Frequency (%) |
| 20170228 | 14 | |
| 20170227 | 14 | |
| 20170226 | 10 | |
| 20170225 | 14 | |
| 20170224 | 16 | |
| 20170223 | 15 | |
| 20170222 | 18 | |
| 20170221 | 14 | |
| 20170220 | 12 | |
| 20170219 | 12 |
| Distinct | 96 |
|---|---|
| Distinct (%) | 2.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.804721474 |
| Minimum | 0 |
|---|---|
| Maximum | 292 |
| Zeros | 1215 |
| Zeros (%) | 25.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 7 |
| 95-th percentile | 28.6 |
| Maximum | 292 |
| Range | 292 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 13.48338795 |
|---|---|
| Coefficient of variation (CV) | 1.981475363 |
| Kurtosis | 78.42058902 |
| Mean | 6.804721474 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 6.505306717 |
| Sum | 32860 |
| Variance | 181.8017507 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1215 | |
| 1 | 748 | |
| 2 | 492 | |
| 3 | 355 | 7.4% |
| 4 | 280 | 5.8% |
| 5 | 222 | 4.6% |
| 6 | 176 | 3.6% |
| 7 | 145 | 3.0% |
| 8 | 126 | 2.6% |
| 10 | 102 | 2.1% |
| Other values (86) | 968 |
| Value | Count | Frequency (%) |
| 0 | 1215 | |
| 1 | 748 | |
| 2 | 492 | |
| 3 | 355 | 7.4% |
| 4 | 280 | 5.8% |
| 5 | 222 | 4.6% |
| 6 | 176 | 3.6% |
| 7 | 145 | 3.0% |
| 8 | 126 | 2.6% |
| 9 | 98 | 2.0% |
| Value | Count | Frequency (%) |
| 292 | 1 | |
| 204 | 1 | |
| 190 | 1 | |
| 183 | 1 | |
| 145 | 1 | |
| 136 | 1 | |
| 121 | 1 | |
| 117 | 1 | |
| 106 | 1 | |
| 105 | 1 |
| Distinct | 37 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.597018016 |
| Minimum | 0 |
|---|---|
| Maximum | 113 |
| Zeros | 2283 |
| Zeros (%) | 47.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 6 |
| Maximum | 113 |
| Range | 113 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 3.852978411 |
|---|---|
| Coefficient of variation (CV) | 2.412607981 |
| Kurtosis | 275.2539392 |
| Mean | 1.597018016 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 12.38751595 |
| Sum | 7712 |
| Variance | 14.84544264 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=37)
| Value | Count | Frequency (%) |
| 0 | 2283 | |
| 1 | 1098 | |
| 2 | 578 | 12.0% |
| 3 | 317 | 6.6% |
| 4 | 158 | 3.3% |
| 5 | 110 | 2.3% |
| 6 | 69 | 1.4% |
| 7 | 37 | 0.8% |
| 9 | 31 | 0.6% |
| 8 | 31 | 0.6% |
| Other values (27) | 117 | 2.4% |
| Value | Count | Frequency (%) |
| 0 | 2283 | |
| 1 | 1098 | |
| 2 | 578 | 12.0% |
| 3 | 317 | 6.6% |
| 4 | 158 | 3.3% |
| 5 | 110 | 2.3% |
| 6 | 69 | 1.4% |
| 7 | 37 | 0.8% |
| 8 | 31 | 0.6% |
| 9 | 31 | 0.6% |
| Value | Count | Frequency (%) |
| 113 | 1 | < 0.1% |
| 103 | 1 | < 0.1% |
| 63 | 1 | < 0.1% |
| 46 | 1 | < 0.1% |
| 44 | 1 | < 0.1% |
| 40 | 1 | < 0.1% |
| 38 | 1 | < 0.1% |
| 36 | 1 | < 0.1% |
| 32 | 3 | |
| 30 | 1 | < 0.1% |
| Distinct | 24 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9803271899 |
| Minimum | 0 |
|---|---|
| Maximum | 43 |
| Zeros | 2595 |
| Zeros (%) | 53.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 43 |
| Range | 43 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.824822578 |
|---|---|
| Coefficient of variation (CV) | 1.86144238 |
| Kurtosis | 98.06264141 |
| Mean | 0.9803271899 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.717025463 |
| Sum | 4734 |
| Variance | 3.329977441 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=24)
| Value | Count | Frequency (%) |
| 0 | 2595 | |
| 1 | 1178 | |
| 2 | 499 | 10.3% |
| 3 | 261 | 5.4% |
| 4 | 123 | 2.5% |
| 5 | 65 | 1.3% |
| 6 | 41 | 0.8% |
| 8 | 20 | 0.4% |
| 7 | 19 | 0.4% |
| 9 | 5 | 0.1% |
| Other values (14) | 23 | 0.5% |
| Value | Count | Frequency (%) |
| 0 | 2595 | |
| 1 | 1178 | |
| 2 | 499 | 10.3% |
| 3 | 261 | 5.4% |
| 4 | 123 | 2.5% |
| 5 | 65 | 1.3% |
| 6 | 41 | 0.8% |
| 7 | 19 | 0.4% |
| 8 | 20 | 0.4% |
| 9 | 5 | 0.1% |
| Value | Count | Frequency (%) |
| 43 | 1 | |
| 33 | 1 | |
| 23 | 1 | |
| 22 | 1 | |
| 21 | 1 | |
| 19 | 1 | |
| 18 | 1 | |
| 17 | 1 | |
| 15 | 1 | |
| 14 | 1 |
| Distinct | 27 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.08987368 |
| Minimum | 0 |
|---|---|
| Maximum | 75 |
| Zeros | 2537 |
| Zeros (%) | 52.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 75 |
| Range | 75 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 2.50637112 |
|---|---|
| Coefficient of variation (CV) | 2.299689557 |
| Kurtosis | 279.9932589 |
| Mean | 1.08987368 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 12.65345601 |
| Sum | 5263 |
| Variance | 6.281896194 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=27)
| Value | Count | Frequency (%) |
| 0 | 2537 | |
| 1 | 1220 | |
| 2 | 460 | 9.5% |
| 3 | 244 | 5.1% |
| 4 | 151 | 3.1% |
| 5 | 89 | 1.8% |
| 6 | 44 | 0.9% |
| 7 | 23 | 0.5% |
| 8 | 11 | 0.2% |
| 11 | 10 | 0.2% |
| Other values (17) | 40 | 0.8% |
| Value | Count | Frequency (%) |
| 0 | 2537 | |
| 1 | 1220 | |
| 2 | 460 | 9.5% |
| 3 | 244 | 5.1% |
| 4 | 151 | 3.1% |
| 5 | 89 | 1.8% |
| 6 | 44 | 0.9% |
| 7 | 23 | 0.5% |
| 8 | 11 | 0.2% |
| 9 | 9 | 0.2% |
| Value | Count | Frequency (%) |
| 75 | 1 | |
| 61 | 1 | |
| 47 | 1 | |
| 42 | 1 | |
| 37 | 1 | |
| 30 | 1 | |
| 26 | 1 | |
| 25 | 1 | |
| 18 | 1 | |
| 17 | 1 |
| Distinct | 201 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.30979499 |
| Minimum | 0 |
|---|---|
| Maximum | 375 |
| Zeros | 167 |
| Zeros (%) | 3.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 6 |
| median | 16 |
| Q3 | 35 |
| 95-th percentile | 101 |
| Maximum | 375 |
| Range | 375 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 35.71394146 |
|---|---|
| Coefficient of variation (CV) | 1.261540095 |
| Kurtosis | 11.79313726 |
| Mean | 28.30979499 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 2.813764771 |
| Sum | 136708 |
| Variance | 1275.485614 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 231 | 4.8% |
| 5 | 194 | 4.0% |
| 4 | 176 | 3.6% |
| 3 | 172 | 3.6% |
| 8 | 168 | 3.5% |
| 0 | 167 | 3.5% |
| 2 | 159 | 3.3% |
| 7 | 148 | 3.1% |
| 6 | 145 | 3.0% |
| 11 | 131 | 2.7% |
| Other values (191) | 3138 |
| Value | Count | Frequency (%) |
| 0 | 167 | |
| 1 | 231 | |
| 2 | 159 | |
| 3 | 172 | |
| 4 | 176 | |
| 5 | 194 | |
| 6 | 145 | |
| 7 | 148 | |
| 8 | 168 | |
| 9 | 124 |
| Value | Count | Frequency (%) |
| 375 | 1 | |
| 373 | 1 | |
| 313 | 1 | |
| 290 | 1 | |
| 280 | 1 | |
| 279 | 2 | |
| 266 | 1 | |
| 243 | 1 | |
| 242 | 1 | |
| 241 | 2 |
| Distinct | 176 |
|---|---|
| Distinct (%) | 3.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.53365086 |
| Minimum | 1 |
|---|---|
| Maximum | 299 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 19 |
| Q3 | 39 |
| 95-th percentile | 90 |
| Maximum | 299 |
| Range | 298 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 30.28583036 |
|---|---|
| Coefficient of variation (CV) | 1.061407477 |
| Kurtosis | 8.142531258 |
| Mean | 28.53365086 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 2.306616814 |
| Sum | 137789 |
| Variance | 917.2315207 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 200 | 4.1% |
| 3 | 183 | 3.8% |
| 5 | 165 | 3.4% |
| 2 | 160 | 3.3% |
| 8 | 152 | 3.1% |
| 4 | 151 | 3.1% |
| 7 | 145 | 3.0% |
| 9 | 140 | 2.9% |
| 6 | 124 | 2.6% |
| 11 | 121 | 2.5% |
| Other values (166) | 3288 |
| Value | Count | Frequency (%) |
| 1 | 200 | |
| 2 | 160 | |
| 3 | 183 | |
| 4 | 151 | |
| 5 | 165 | |
| 6 | 124 | |
| 7 | 145 | |
| 8 | 152 | |
| 9 | 140 | |
| 10 | 116 |
| Value | Count | Frequency (%) |
| 299 | 1 | |
| 284 | 1 | |
| 281 | 1 | |
| 242 | 1 | |
| 218 | 1 | |
| 207 | 1 | |
| 196 | 1 | |
| 195 | 1 | |
| 194 | 1 | |
| 191 | 1 |
| Distinct | 3383 |
|---|---|
| Distinct (%) | 70.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 3 |
| Infinite (%) | 0.1% |
| Mean | inf |
| Minimum | 0.7431640625 |
|---|---|
| Maximum | inf |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0.7431640625 |
|---|---|
| 5-th percentile | 342.4 |
| Q1 | 1881 |
| median | 4496 |
| Q3 | 9432 |
| 95-th percentile | 26019.2 |
| Maximum | inf |
| Range | inf |
| Interquartile range (IQR) | 7551 |
Descriptive statistics
| Standard deviation | nan |
|---|---|
| Coefficient of variation (CV) | nan |
| Kurtosis | nan |
| Mean | inf |
| Median Absolute Deviation (MAD) | 3132 |
| Skewness | nan |
| Sum | inf |
| Variance | nan |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8288 | 6 | 0.1% |
| 5508 | 5 | 0.1% |
| 4496 | 5 | 0.1% |
| 2710 | 5 | 0.1% |
| 2122 | 5 | 0.1% |
| 8344 | 5 | 0.1% |
| 8920 | 5 | 0.1% |
| 4672 | 5 | 0.1% |
| 8880 | 5 | 0.1% |
| 4772 | 5 | 0.1% |
| Other values (3373) | 4778 |
| Value | Count | Frequency (%) |
| 0.7431640625 | 1 | |
| 0.8359375 | 1 | |
| 1.838867188 | 1 | |
| 2.181640625 | 1 | |
| 2.66796875 | 1 | |
| 3.8125 | 1 | |
| 4.41015625 | 1 | |
| 4.625 | 1 | |
| 6.28125 | 1 | |
| 7.62890625 | 1 |
| Value | Count | Frequency (%) |
| inf | 3 | |
| 64640 | 1 | < 0.1% |
| 63040 | 1 | < 0.1% |
| 62720 | 1 | < 0.1% |
| 62048 | 1 | < 0.1% |
| 60928 | 1 | < 0.1% |
| 60800 | 1 | < 0.1% |
| 60128 | 1 | < 0.1% |
| 59008 | 1 | < 0.1% |
| 58048 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | msno | date | num_25 | num_50 | num_75 | num_985 | num_100 | num_unq | total_secs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2415780 | tej+Qj4xcYOl/Eo/m2s7tOajEsQaaKdOgkInffLDF1E= | 20170129 | 1 | 1 | 0 | 1 | 32 | 17 | 8368.0000 |
| 1 | 4584805 | pYFqXAy0HELNV4ZhfcrzOxeb81Oh8dLx+5hRaA9rV14= | 20170212 | 4 | 2 | 0 | 0 | 5 | 6 | 1396.0000 |
| 2 | 1725913 | fi4aWbGcpaowRmm9yBoZUkWoYveFvVu2TCU+3bWWvtk= | 20170206 | 0 | 0 | 0 | 0 | 10 | 10 | 2768.0000 |
| 3 | 3616910 | sQ4z126F7JuRtOuDKhBBhKcaEw9jSrU5Tkal/uZhtfA= | 20161223 | 0 | 0 | 1 | 0 | 5 | 6 | 1215.0000 |
| 4 | 1068647 | XCbodeB083/UlSDlanPQbWQDdNNLUdyOjbIJz1KW5/Q= | 20160530 | 1 | 0 | 0 | 0 | 0 | 1 | 14.1875 |
| 5 | 2120337 | HEy4RCCczLtnKgTNc1e8cqF89rHzfaz0l+YnQk+VqZg= | 20160823 | 0 | 0 | 1 | 0 | 1 | 2 | 403.0000 |
| 6 | 1516916 | dM3SfMSYNrASG21pjaRKCDLybShNM3UWrQ6B6qQy0xA= | 20160403 | 0 | 0 | 0 | 0 | 42 | 42 | 9840.0000 |
| 7 | 2636443 | v4td4+kAOP1ZQ65i23wR98oCC/683N9ckiUG/wXz1zg= | 20160522 | 0 | 0 | 0 | 0 | 4 | 4 | 802.0000 |
| 8 | 1360093 | A5g2OKmuW+tNykWvB3VGLqqg3L/Q51w5EkKTvHl6P1s= | 20160820 | 7 | 0 | 0 | 1 | 14 | 19 | 4168.0000 |
| 9 | 4154110 | 2b7ZDK/hU1mPFSJ49g0hrnJfiIb4dKpHAmnbNM2D9XY= | 20161202 | 0 | 0 | 1 | 0 | 4 | 5 | 1161.0000 |
Last rows
| df_index | msno | date | num_25 | num_50 | num_75 | num_985 | num_100 | num_unq | total_secs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 4819 | 4602247 | 3Qvzm/ImdX+QpPhA5PEdxanffJ5gClVLLGYQt3NYfVY= | 20160623 | 4 | 0 | 1 | 1 | 22 | 28 | 5936.0 |
| 4820 | 3139202 | fk/UAHsgoKFbsUZ5zaWQwqqq+LDY2frHi+PC8sA+msg= | 20161228 | 3 | 0 | 0 | 0 | 11 | 12 | 2836.0 |
| 4821 | 1459242 | nlng/ax+vA/dN9usZiUIOfTnfi4pkeG58DUPCsOiVJ0= | 20151222 | 12 | 0 | 0 | 0 | 166 | 49 | 42112.0 |
| 4822 | 3647998 | jYSQWJvS3O+AhzKMxE1UVBUBQiSRw4yLiyAqce2cy/8= | 20161112 | 4 | 1 | 1 | 3 | 9 | 18 | 3052.0 |
| 4823 | 2071228 | /EJJnh5LlU677MYh/XUCrWwhyBLFTBCpdWWPmtrjmdc= | 20160302 | 0 | 0 | 0 | 0 | 29 | 24 | 7176.0 |
| 4824 | 2736932 | MlrUpeXRX5kzSDrxie2Ie6imxsz4Uyq6xNU19Mv28ZU= | 20160315 | 0 | 0 | 0 | 1 | 2 | 3 | 777.5 |
| 4825 | 2863801 | mt9qBgkjws2VzYX3PHiGw3VrT/d8ut/LlvPzVdwYz0w= | 20160411 | 23 | 13 | 1 | 1 | 0 | 36 | 1917.0 |
| 4826 | 1701042 | YCzjkI8TLKPe8L/6s8lAlpqffOrGzFWkMsiYdjtgU8A= | 20160315 | 0 | 0 | 0 | 0 | 23 | 22 | 5532.0 |
| 4827 | 2098588 | pEOXyeH30m9Gg5pzMuPKdJzJv8z7lkg2SJ7d4+9rHcI= | 20161218 | 35 | 1 | 1 | 5 | 86 | 87 | 21904.0 |
| 4828 | 2515098 | 6eLykvU/uxBfjHpRpYstV++XmVbI7NDWpgVbTCBIqQM= | 20161212 | 5 | 1 | 1 | 1 | 21 | 17 | 5876.0 |